Derivation and utility of schizophrenia polygenic risk associated multimodal MRI frontotemporal network

Schizophrenia is a highly heritable psychiatric disorder characterized by widespread functional and structural brain abnormalities. However, previous association studies between MRI and polygenic risk were mostly ROI-based single modality analyses, rather than identifying brain-based multimodal predictive biomarkers. Based on schizophrenia polygenic risk scores (PRS) from healthy white people within the UK Biobank dataset (N = 22,459), we discovered a robust PRS-associated brain pattern with smaller gray matter volume and decreased functional activation in frontotemporal cortex, which distinguished schizophrenia from controls with >83% accuracy, and predicted cognition and symptoms across 4 independent schizophrenia cohorts. Further multi-disease comparisons demonstrated that these identified frontotemporal alterations were most severe in schizophrenia and schizo-affective patients, milder in bipolar disorder, and indistinguishable from controls in autism, depression and attention-deficit hyperactivity disorder. These findings indicate the potential of the identified PRS-associated multimodal frontotemporal network to serve as a trans-diagnostic gene intermediated brain biomarker specific to schizophrenia.


Power analysis
We also calculated the statistical power of SZ-PRS and the two MRI features for fusion input (fALFF and GMV) using G*Power software 1 (http://www.softpedia.com/get/Science-CAD/G-Power.shtml). As in this study, the sample size is N=22773 HCs. The effect size of PRS correlates with fALFF loadings is r = 0.074. Given the significance level = 0.05, sample size (N=22773), and the effect size = 0.074, the statistical power of the correlation is 1. The same method was used to calculate the MRI features, achieving the statistical power of 1 for fALFF and 1 for GMV respectively, which are all high enough to assure accurate and robust conclusions about the correlations detected.

Supplementary Figure 2.
Statistical power generated from the G*Power software.

Weakly association between PRS and the identified components
The correlation between the identified IC and PRS is not very high due to the large sample size.
However, compared to the existing studies based on UKB, we found that it is normal that the varaince explained was <1% by correlating SZ-PRS with imaging phenotypes 2,3 under differnet SNP thresholds and for all the cortical and subcortical areas. This is consistent with a recently published study in Nature 2022 that smaller (sample size) brain wide association studies have reported larger correlations than the largest effects measured in larger samples 4 . The above PRS and ROI-based single modality association investigations found that the varaince explained was <1% by correlating SZ-PRS with imaging phenotypes 2,3 under differnet SNP thresholds and for all the cortical and subcortical areas.

For the current study
We have calculated the direct correlation between SZ-PRS and voxel wise MRI features throughout the brain (60758 and 90638 voxels for fALFF and GMV). The maximum absolute correlation r is only 0.03 and 0.028, and the mean r is 0.008 and 0.0006 for fALFF and GMV respectively.
Apart from the voxel-wise correlation between SZ-PRS and MRI features, we also tested the correlation between the mean values extracted from ALL atlas and SZ-PRS for both fALFF and GMV under different SNP thresholds. Results (Supplementary Fig. 5 and Supplementary Table 1) showed that the variance explained was <1% for all the brain areas under 3 different SNP thresholds. Figure 5. Correlations between SZ PRS and the mean values extracted from AAL atlas for both fALFF and GMV under different SNP thresholds (5.0e-08, 1.0e-04 and 0.05).

Linear projection
In this study, we further tested the replicability of the associations detected between PRS and multimodal components within UKB, i.e., whether the association between PRS and the same pattern can be detected in independent SZ dataset, by performing cross-site linear projection analysis. showed that the association between PRS and frontotemporal pattern identified in UKB can still be examined in independent SZ patients, which means that the association can be replicated.

Null pattern
In order to see the null pattern, we permuted the reference vector (PRS) in the supervised fusion analysis. The goal is to compute the null model of spatial patterns that are observed by chance. To do this we hold imaging variables (e.g. [ 1 , 2 ]) constant, and permute the PRS against them. Thus each Xi is randomly paired with a reference. This permuted reference was then used as reference in the supervised fusion analysis (MCCAR+jICA). By repeating this process, a large number of times (500), we obtain 500 fMRI-sMRI covarying patterns associated with the permuted reference. We also record the number of times each voxel occurs. Here we presented the most frequently occurring voxels (those which occur more than 70% of the time) associated with the permuted PRS, as shown in Supplementary Fig. 6b. Note that the permuted null model of spatial pattern is different from the comprised frontotemporal system (no hippocampus complex and insular detected in null pattern), confirming that the identified PRS pattern is specific to the PRS but not a random null pattern. frequently occurring (voxels with more than 60% occurrences) covarying pattern associated with 500 times permuted PRS.

Spatial similarity
Here, take fALFF components as an example. We calculated the spatial correlation of the identified PRS-associated components between Fig. 3a and Fig. 3b with only voxels masked at |Z|>T (threshold). First, the spatial maps were transformed into Z scores and masked at |Z|>2. Then we obtained two masks from Fig. 3a (mask_a) and Fig. 3b (mask_b) respectively, which were used to perform the voxel selection. Only voxels that fell in the union of the masks (mask_a ∪ mask_b , regardless of positive and negative) were used to calculate the spatial correlation. Thus total number of voxels in calculating the spatial correlation is greatly reduced, e.g., from = 153594 (the whole brain voxels) to = 5635 (T=2). Spatial correlation was finally performed on these commonly identified voxels ( =5635) between Fig. 3a and

Site effect Site effect on PRS-MRI fusion within UKB
For the MRI imaging data, there are three sites available in UKB, including Cheadle, Reading and Newcastle. We performed the PRS-guided fusion for each site separately to test the similarity of the identified PRS-associated frontotemporal pattern. Dice index, equation (1) was used to calculate the overlap percentage of the spatial maps between sites. Dice index is a statistical validation for comparing the spatial similarity of binary images, for example in image segmentation accuracy assessment. We calculated the Dice index of the identified PRS-associated component between two cohorts using only voxels masked at |Z|>2, resulting in two masks from UKB (mask_UKB) and Cheadle/Reading/Newcastle (mask_Cheadle/Reading/Newcastle) respectively. Only voxels that fell into the union of the masks (mask_UKB ∪ mask_Cheadle) were used to calculate the cross-cohort similarity as shown in equation (1).
The spatial similarities were > 0.70 across UKB, Cheadle, Reading, and Newcastle for both fALFF and GMV components.

Motion effect Motion on preprocessing
To control confounding effects of motion artifact, several strategies were conducted. In the preprocessing procedure for fMRI, we despiked the fMRI data: nuisance covariates ( Table 7. There is no significant correlations between the mean FD and age, gender, PRS, handiness and ethnicity.

PRS pattern on UKB subset with head motion <0.2mm
We also exclude subjects with >0.2mm FD to get a subset of UKB (N = 13490, 60% subjects' head motion <0.2mm) to perform the fusion with PRS to test whether the identified multimodal frontotemporal pattern can be replicated. Result (Supplementary Fig. 12b) show that the identified PRS-associated pattern (frontotemporal cortex and thalamus in fALFF, accompanied with thalamus, hippocampus, para-hippocampus and temporal cortex in GMV) can be validated on UKB subset with FD<0.2mm. This means that the head motion is not a major confounding factor for our current fusion results.

Group differences of mean FD between SZ and HC
We have calculated the group differences of mean FD between HC and SZ across the 4 SZ cohorts included in this study. Note that there is no significant differences between patients and controls on

Partial correlation
Partial correlation has been proposed as an alternative approach for removing spurious shared variance in correlation analysis 6 . Here, we also performed partial correlation analysis between the identified component and PRS by regressing out mean FD. Result show that the significant level is not changed by mean FD (p = 5.2e-30* for fALFF, p = 2.3e-28* for GMV as in Fig. 2b).

fALFF not functional connectivity
Furthermore, fMRI data were spatially smoothed with a 6 mm full width half max (FWHM) Gaussian filter. To calculate fractional amplitude of low frequency fluctuations (fALFF) 7 , the sum of the amplitude values in the 0.01 to 0.08Hz low-frequency power range was divided by the sum of the amplitudes over the entire detectable power spectrum (range: 0-0.25Hz) 8 . So, the fusion analysis was conducted on the spatial maps of fALFF not the function connectivity. Previous fMRI studies found that head motion was sensitive to functional connectivity analysis [9][10][11][12][13] . However, the current fusion analysis was conducted on the spatial maps of fALFF not functional connectivity. While it is the functional connectivity derived from rs-fMRI that is more sensitive to head motion 9-13 .
Collectively, considering there was no group difference in head motion between SZ and HC, and no significant correlation between mean FD and PRS, and the partial correlation between the identified component and PRS still significant after regressing out mean FD, and the PRS-pattern was replicated on UKB subset with head motion <0.2mm, the current fusion analysis was based on fALFF not functional connectivity, we believe that micro-motion was not a major factor affecting the current results.

Classification on scanning site
There are 4 independent SZ cohorts (BSNIP, COBRE, fBIRN, MPRC) included in our current study. However, different SZ cohorts consist different number of sites. There are 5 sites for BSNIP, 1 site for COBRE, 7 sites for fBIRN and 3 sites for MPRC. Since COBRE is a single site, so the classification on scanning sites are performed for BSNIP (class=3), fBIRN (class=7) and MPRC (class=3). The mean fALFF/GMV plus the first 5PCs within positive and negative PRS-associated brain networks were used as feature input and sites was treated as labels in the SVM classifications.
Results (Supplementary Fig. 16) showed that all the classification accuracies were approximated as around 50% as a random distributed accuracy (the more number of site the lower classification accuracy). This means that site is not a major confounding factor for the current SZ-HC classification result.
Supplementary Figure 16. The classification results on scan sites for BSNIP, fBIRN and MPRC cohorts. Upper row represents ROC; lower row represents confusion matrix.

Site effect on PRS, PANSS and cognition
Anova test (site was used as covariate) showed that there was no site difference of PRS (p=0.96) for UKB data. The site differences of PANSS and cognition for independent SZ cohorts were shown in the following Table. Since clinical scores are not available for MPRC cohort, and COBRE is a single site cohort, so these two cohorts were not included in the following Table. Collectively, all above results indicate that site is not a major confounding factor for PRS pattern, and classification.

Multimodal imaging parameters and preprocessing
Resting state fMRI For the sMRI data was normalized to MNI space using the unified segmentation method in SPM12, resliced to 3 × 3 × 3 mm, and segmented into gray matter (GM), white matter (WM), and cerebral spinal fluid (CSF) using modulated normalization algorithms, resulting outputs as gray matter volume (GMV). Then the GMV were smoothed using a Gaussian kernel with a full width at half maximum (FWHM) = 6 mm. Subject outlier detection was further performed using a spatial Pearson correlation with the template image to ensure that all subjects were properly segmented.

Cognitive measures
Brief Verbal Learning. This domain score was based on the total number of correctly recalled target words for all three trials on the Semantic Verbal Learning Test z-scores; (5) Visual Learning. This domain score was based on the square-transformed total of the Visual Figure Learning Test z-scores, and (6) Reasoning/Problem Solving. This domain score was based on the square transformed Maze Solving Test total score z-scores. Finally, the CMINDS composite score was defined as the mean of all six normalized domain scores." MATRICS Consensus Cognitive Battery (MCCB) system was also launched by NIMH, and contains one more domain (social cognition) than CMINDS. As reported earlier 14 , CMINDS is very similar to MATRICS on measuring cognitive deficits in SZ. The differences in details between CMINDS and MCCB tasks have been previously cited 14 .